Iterative Universal Hash Function Generator for Minhashing
نویسنده
چکیده
Minhashing is a technique used to estimate the Jaccard Index between two sets by exploiting the probability of collision in a random permutation. In order to speed up the computation, a random permutation can be approximated by using an universal hash function such as the ha,b function proposed by Carter and Wegman. A better estimate of the Jaccard Index can be achieved by using many of these hash functions, created at random. In this paper a new iterative procedure to generate a set of ha,b functions is devised that eliminates the need for a list of random values and avoid the multiplication operation during the calculation. The properties of the generated hash functions remains that of an universal hash function family. This is possible due to the random nature of features occurrence on sparse datasets. Results show that the uniformity of hashing the features is maintaned while obtaining a speed up of up to 1.38 compared to the traditional approach.
منابع مشابه
Simple Pseudorandom Number Generator with Strengthened Double Encryption (Cilia)
A new cryptographic pseudorandom number generator Cilia is presented. It hashes real random data using an iterative hash function to update its secret state, and it generates pseudorandom numbers using a block cipher. Cilia is a simple algorithm that uses an improved variant of double encryption with additional security to generate pseudorandom numbers, and its performance is similar to double ...
متن کاملQuantum Hashing via Classical $\epsilon$-universal Hashing Constructions
We define the concept of a quantum hash generator and offer a design, which allows one to build a large number of different quantum hash functions. The construction is based on composition of a classical ǫ-universal hash family and a given family of functions – quantum hash generators. The relationship between ǫ-universal hash families and error-correcting codes give possibilities to build a la...
متن کاملA Cookbook for Black-Box Separations and a Recipe for UOWHFs
We present a new framework for proving fully black-box separations and lower bounds. We prove a general theorem that facilitates the proofs of fully black-box lower bounds from a one-way function (OWF). Loosely speaking, our theorem says that in order to prove that a fully black-box construction does not securely construct a cryptographic primitive Q (e.g., a pseudo-random generator or a univer...
متن کاملAn Improved Hash Function Based on the Tillich-Zémor Hash Function
Using the idea behind the Tillich-Zémor hash function, we propose a new hash function. Our hash function is parallelizable and its collision resistance is implied by a hardness assumption on a mathematical problem. Also, it is secure against the known attacks. It is the most secure variant of the Tillich-Zémor hash function until now.
متن کاملLeftover Hash Lemma, Revisited
The famous Leftover Hash Lemma (LHL) states that (almost) universal hash functions are good randomness extractors. Despite its numerous applications, LHL-based extractors suffer from the following two limitations: – Large Entropy Loss: to extract v bits from distribution X of minentropy m which are ε-close to uniform, one must set v ≤ m − 2 log (1/ε), meaning that the entropy loss L def = m − v...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1401.6124 شماره
صفحات -
تاریخ انتشار 2014